
1
=1
{1, . . . , n}
n −1
p
i
= P ( = i | x) P ( = N | x) 1 −
n−1
i=1
P ( = i | x).
n
p = softmax(a) ⇐⇒ p
i
=
e
a
i
j
e
a
j
.
a
a = b+W h a
n
n−1 a
a
i
i
L
NLL
(p, y) = −log p
y
a
y
a
y
y a
i
i = y x
p
y
= 1 x p
i
= i x
a
∂
∂a
k
L
NLL
(p, y) =
∂
∂a
k
(−log p
y
) =
∂
∂a
k
(−a
y
+ log
j
e
a
j
)
= −1
y=k
+
e
a
k
j
e
a
j
= p
k
− 1
y=k
or
∂
∂a
L
NLL
(p, y) = (p −e
y
)
e
y
= [0, . . . , 0, 1, 0, . . . , 0] y
x a a